/********************************************************************************
Discrimination in Multi-Phase Systems: Evidence from Child Protection

Created on: 12/28/2022
Last Modified on: 2/13/2024

Description: This program generates the screener analysis sample, at the 
child x call level level.

Note that we have removed the file directory names from this program for 
confidentiality reasons.
********************************************************************************/

**************************
**(0) SETUP
**************************
clear
set more off
macro drop all
capture log close
set seed 02042023

*Set directories 
global cleandata 
global tmpdata 
global output 


**************************
**GENERATE SAMPLE AND REMAINING VARIABLES NEEDED FOR SUBSEQUENT ANALYSES
**************************
use "${tmpdata}all_hotline_calls_main_restrictions_qje.dta", clear
**Drop screeners with fewer than 100 calls and investigators with fewer than 200 investigations
bysort screener: gen n = _N 
keep if n >100 

cap drop n 
bysort worker_id: gen n = _N if screened==1
drop if n<200 & screened==1

**Additional sample restrictions needed for the analysis (sexual abuse cases are not randomly assigned)
drop if sexab==1 

**Zip code is a key variable in the analysis 
replace zipcode_vic="." if zipcode_vic=="NA"
destring zipcode_vic, replace 
drop if zipcode_vic==. & screened==1
sum inv6m

**Keep only calls (and investigations) that are not within 365 days for the same child 
sort childpartyid cw_date_stata intake_id, stable 
order childpartyid cw_date_stata 
by childpartyid: gen diff = cw_date_stata[_n] - cw_date_stata[_n-1] 
order diff 
drop if diff<365 
sum inv6m 

**Generate screener unique numeric identifiers 
drop screener 
egen screener = group(scrnr_first_nm scrnr_last_nm) //162 screeners for now

**Generate rotation variable: exact day X shift fixed effect 
gen cw_time = substr(complaint_dttm,12,5)
order cw_time 

gen cw_hour = substr(cw_time,1,2)
order cw_hour 
destring cw_hour, replace 

tab cw_hour
replace cw_hour=24 if cw_hour==0 

gen shift=. 
replace shift=1 if cw_hour >=8 & cw_hour<=16 
replace shift=2 if cw_hour >16 & cw_hour<=24 
replace shift=3 if cw_hour >=1 & cw_hour<=7

egen rotation = group(cw_date_stata shift)
			
**Generate remaining variables needed for summary statistics and label 
replace child_age ="" if child_age=="NA"
destring child_age, replace 
replace child_age=0 if child_age<0
replace child_age = round(child_age)
rename child_age age 
gen miss_age = age==.
replace age=18 if age==. //missing covariate adjustment//

gen female = child_sex=="f" 
gen miss_female = child_sex=="NA" | child_sex=="u"

gen neglect = phyneg==1 | medneg==1 | impsup==1 | failprot==1 
gen other_reporter = edu==0 & family==0 & medical==0 & counselor==0 & law==0 & court==0 & mdhhs==0 & miss_reporter==0 & provider==0 

gen cw_time_stata = clock(cw_time,"hm")
order cw_time cw_time_stata 

**Subsequent investigation within six months, only among those left at home
replace inv6m =. if fc==1

*Gen screening instrument, following the description in Arnold Dobbie Yang (QJE, 2018)
cap drop n 
bysort screener black: gen n = _N
gen n_leaveout = n - 1
order n_leaveout

*Residualize the screening variable
reghdfe screened, absorb(rotation) resid
predict fchat, resid
bysort screener black: egen fc_sum = total(fchat)
order fc_sum

*Calculate leave-one-out average 
gen fc_leaveout = fc_sum - fchat
gen z = fc_leaveout/n_leaveout
sum z 

**Label variables for tables:
label var white "White" 
label var female "Female" 
label var age "Age during investigation" 
label var phyab "Physical abuse allegation" 
label var neg "Neglect allegation" 
label var maltreatment "Maltreatment allegation" 
label var edu "Education personnel" 
label var law "Law enforcement"
label var family "Family member" 
label var medical "Medical personnel"
label var counselor "Counselor/Therapist"
label var mdhhs "MDHHS"
label var provider "Provider"
label var other_reporter "Other"
label var court "Court"
label var screened "Screen-in rate" 
label var fc "Foster care placement rate"
label var inv6m "Re-investigated within 6 months"

save "${cleandata}screener_analysis_sample_child_call.dta", replace